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On likelihood ratio tests 

Erich L. LehmannP 

University of California at Berkeley 

Abstract: Likelihood ratio tests are intuitively appealing. Nevertheless, a 
number of examples are known in which they perform very poorly. The present 
paper discusses a large class of situations in which this is the case, and analyzes 
just how intuition misleads us; it also presents an alternative approach which 
in these situations is optimal. 

1. The popularity of likelihood ratio tests 

Faced with a new testing problem, the most common approach is the likelihood 
ratio (LR) test. Introduced by Neyman and Pearson in 1928, it compares the max- 
imum likelihood under the alternatives with that under the hypothesis. It owes its 
popularity to a number of facts. 

(i) It is intuitively appealing. The likelihood of 6, 

i.e. the probability density (or probability) of x considered as a function of 9, is 
widely considered a (relative) measure of support that the observation x gives to 
the parameter 6. (See for example Royall [8]). Then the likelihood ratio 

snp[pe{x)]/ snp[p0{x)] (1.1) 

alt It-VP 

compares the best explanation the data provide for the alternatives with the best 
explanations for the hypothesis. This seems quite persuasive. 

(ii) In many standard problems, the LR test agrees with tests obtained from other 
principles (for example it is UMP unbiased or UMP invariant). Generally it seems 
to lead to satisfactory tests. However, counter-examples are also known in which 
the test is quite unsatisfactory; see for example Perlman and Wu [7] and Menendez, 
Rueda, and Salvador [6]. 

(iii) The LR test, under suitable conditons, has good asymptotic properties. 
None of these three reasons are convincing. 

(iii) tells us little about small samples. 

(i) has no strong logical grounding. 

(ii) is the most persuasive, but in these standard problems (in which there typically 
exist a complete set of sufficient statistics) all principles typically lead to tests that 
are the same or differ only by little. 
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In view of lacking theoretical support and many counterexamples, it would be 
good to investigate LR tests systematically for small samples, a suggestion also 
made by Perlman and Wu [7] . The present paper attempts a first small step in this 
endeavor. 

2. The case of two alternatives 

The simplest testing situation is that of testing a simple hypothesis against a simple 
alternative. Here the Neyman-Pearson Lemma completely vindicates the LR-test, 
which always provides the most powerful test. Note however that in this case no 
maximization is involved in either the numerator or denominator of (1.1), and as 
we shall see, it is just these maximizations that are questionable. 

The next simple situation is that of a simple hypothesis and two alternatives, 
and this is the case we shall now consider. 

Let X_={Xi, . . . , Xn) where the X's are iid. Without loss of generality suppose 
that under H the X's arc uniformly distributed on (0,1). Consider two alterna- 
tives f,g on (0,1). To simplify further, we shall assume that the alternatives are 
symmetric, i.e. that 

Pl{^) f{xi)---.f{Xn) 

(2.1) 

P2{x) = f{l-Xl)- ■■/{!- Xn). 

Then it is natural to restrict attention to symmetric tests (that is the invariance 
principle) i.e. to rejection regions R satisfying 

{xi, . . . ,Xn) e i? if and only if {1 — xi, . . . ,1 — a;„) G R. (2.2) 

The following result shows that under these assumptions there exists a uniformly 
most powerful (UMP) invariant test, i.e. a test that among all invariant tests max- 
imizes the power against both piand p2- 

Theorem 2.1. For testing H against the alternatives (2.1) there exists among all 
level a rejection regions R satisfying (2.2) one that maximizes the power against 
both pi and P2 and it rejects H when 

2 [Pi(^) +P2(^)] is sufficiently large. (2.3) 

We shall call the test (2.3) the average likelihood ratio test and from now on shall 
refer to (1.1) as the maximum likelihood ratio test. 

Proof. If R satisfies (2.2), its power against piand p2 must be the same. Hence 

f Pi = [ P2= [ l{Pi+P2)- (2.4) 

□ 

By the Neyman-Pearson Lemma, the most powerful test of H against +P2\ 
rejects when (2.3) holds. 

Corollary 2.1. Under the assumptions of Theorem 2.1, the average LR test has 
power greater than or equal to that of the maximum likelihood ratio test against both 
Pi andp2. 
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Proof. The maximum LR test rejects when 

max(pi(^), P2{x)) is sufficiently large. (2.5) 

Since this test satisfies (2.2), the result follows. '-' 

The Corollary leaves open the possibility that the average and maximum LR tests 
have the same power; in particular they may coincide. To explore this possibility 
consider the case n = 1 and suppose that / is increasing. Then the likelihood ratio 
will be 

f{x) iix>\ and f{l-x)iix<^. (2.6) 



The maximum LR test will therefore reject when 

is sufficiently large (2.7) 



1 

^-2 



i.e. when x is close to either or 1. 

It turns out that the average LR test will depend on the shape of / and we shall 
consider two cases: (a) / is convex; (b) / is concave. 

Theorem 2.2. Under the assumptions of Theorem 2.1 and with n= 1, 

(i) (a) if f is convex, the average LR test rejects when (2.7) holds; 
(b) if f is concave, the average LR test rejects when 



1 

^"2 



is sufficiently small. (2-8) 



(ii) (a) if f is convex, the maximum LR test coincides with the average LR test, 
and hence is UMP among all tests satisfying (2.2) for n=l. 
(b) if f is concave, the maximum LR test uniformly minimizes the power 
among all tests satisfying (2.2) for n=l, and therefore has power < a. 

Proof. This is an immediate consequence of the fact that if x < x' <y' <y then 

M±m is > m±m ,,r.s — (2.9) 

2^2 concave. ^ ' 



It is clear from the argument that the superiority of the average; over the 
likelihood ratio test in the concave case will hold even if pi and p2 are not exactly 
symmetric. Furthermore it also holds if the two alternatives pi and P2 are replaced 
by the family Opi + (1 - e)p2, < ^ < 1. □ 



3. A finite number of alternatives 

The comparison of maximum and average likelihood ratio tests discussed in Sec- 
tion 2 for the case of two alternatives obtains much more generally. In the present 
section we shall sketch the corresponding result for the case of a simple hypothesis 
against a finite number of alternatives which exhibit a symmetry generalizing (2.1). 

Suppose the densities of the simple hypothesis and the s alternatives are denoted 
hy po,pi, ... ,Ps and that there exists a group G of transformations of the sample 
which leaves invariant both po and the set {pi, . . . ,Ps} (i.e. each transformation 
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results in a permutation of pi, . . . ,Ps). Let G denote the set of these permutations 
and suppose that it is transitive over the set {pi, . . . ,Ps} i.e. that given any i and 
j there exists a transformation in G taking pi into pj. A rejection region R is said 
to be invariant under G if 

X & R if and only if g{x) G R for all g in G. (3.1) 

Theorem 3.1. Under these assumptions there exists a uniformly most powerful 
invariant test and it rejects when 

Ei^iPi{x)/s is sufficiently large. (3.2) 
Po{x) 

In generalization of the terminology of Theorem 2.1 we shall call (3.2) the average 
likelihood ratio test. The proof of Theorem 3.1 exactly parallels that of Theorem 2.1. 

The Theorem extends to the case where G is a compact group. The; average in 
the numerator of (3.2) is then replaced by the integral with respect to the (unique) 
invariant probability measure over G. For details see Eaton ([3], Chapter 4). A fur- 
ther extension is to the case where not only the alternatives but also the hypothesis 
is composite. 

To illustrate Theorem 3.1, let us extend the case considered in Section 2. Let 
{X, Y) have a bivariate distribution over the unit square which is uniform under 
H. Let / be a density for {X, Y) which is strictly increasing in both variables and 
consider the four alternatives 

Pi = f{x, y), P2 = /(I - X, y), P3 = f{x, 1 - y), p^ = f{l-x,l- y). 

The group G consists of the four transformations 

9i{x,y) = {x,y), g2{x,y) = {l-x,y), g3{x,y) = {x,l-y), 
and g4{x, y) = {l-x,l- y). 

They induce in the space of {pi, - ■ ■ ,Pi) the transformations: 

gi = the identity 

92- Pi P2, P2 Pi, PS P4, Pi P3 

93- Pi P3, P3 Pi, P2 PA, P4 P2 
Pi Pi, Pi Pi, P2 -> P3, P3 P2- 

This is clearly transitive, so that Theorem 3.1 applies. The uniformly most powerful 
invariant test, which rejects when 



^Pi{x,y) is large 



is therefore uniformly at least as powerful as the maximum likelihood ratio test 
which rejects when 

max [pi {x, y) ,P2 {x, y) ,P3 {x, y) ,pi {x, y)] 



is large. 
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4. Location-scale families 



In the present section we shall consider some more classical problems in which the 
symmetries are represented by infinite groups which are not compact. As a simple 
example let the hypothesis H and the alternatives K be specified respectively by 

H : fixi-e,...,xn-e) and K : g{xi - 9, . . . ,Xn - 9) (4.1) 

where / and g arc given densities and is an unknown location parameter. We 
might for example want to test a normal distribution with unknown mean against 
a logistic or Cauchy distribution with unknown center. 

The symmetry in this problem is characterized by the invariance of H and K 
under the transformations 

X[ = Xi+c {i = l,...,n). (4.2) 

It can be shown that there exists a uniformly most powerful invariant test which 
rejects H when 

g{xi — 9, . . . , Xn — 9)d6 
roo°° 7. TTT^ is large. (4.3) 

j_oc/('^i -d.---,Xn-9)d9 

The method of proof used for Theorem 2.1 and which also works for Theorem 3.1 
no longer works in the present case since the numerator (and denominator) no longer 
are averages. For the same reason the term average likelihood ratio is no longer 
appropriate and is replaced by integrated likelihood. However an easy alternative 
proof is given for example in Lehmann ([5], Section 6.3). 

In contrast to (4.2), the maximum likelihood ratio test rejects when 

g{xi -9i,...,x„ -9i) 

/(xi -eo,...,x„ - do) 

where ^land 9o are the maximum likelihood estimators of 9 under g and / respec- 
tively. Since (4.4) is also invariant under the transformations (4.2), it follows that 
the test (4.3) is uniformly at least as powerful as (4.4), and in fact more powerful 
unless the two tests coincide which will happen only in special cases. 

The situation is quite similar for scale instead of location families. The problem 
(4.1) is now replaced by 

i^/(^,...,^) and K : l^g{^, . . . (4.5) 

where either the x's are all positive or / and g and symmetric about in each 
variable. 

This problem remains invariant under the transformations 

X[ = cXi,c>0. (4.6) 

It can be shown that a uniformly most powerful invariant test exists and rejects H 
when 

7oo „ ') ^ is large. (4.7) 
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On the other hand, the maximum Ukehhood ratio test rejects when 



is large (4.8) 



where fi and fo are the maximum likehhood estimators of r under g and / respec- 
tively. Since it is invariant under the transformations (4.6), the test (4.8) is less 
powerful than (4.7) unless they coincide. 

As in (4.3), the test (4.7) involves an integrated likelihood, but while in (4.3) 
the parameter 9 was integrated with respect to Lebesgue measure, the nuisance 
parameter in (4.6) is integrated with respect to u"~^di'. A crucial feature which all 
the examples of Sections 2-4 have in common is that the group of transformations 
that leave H and K invariant is transitive i.e. that there exists a transformation 
which for any two members of H (or of K) takes one into the other. A general 
theory of this case is given in Eaton ([3], Sections 6.7 and 6.4). 

Elimination of nuisance parameters through integrated likelihood is recommended 
very generally by Berger, Liseo and Wolpert [1]. For the case that invariance con- 
siderations do not apply, they propose integration with respect to non-informative 
priors over the nuisance parameters. (For a review of such prior distributions, see 
Kass and Wasserman [4]). 



5. The failure of intuition 



The examples of the previous sections show that the intuitive appeal of maximum 
likelihood ratio tests can be misleading. (For related findings see Berger and Wolpert 
([2], pp. 125-135)). To understand just how intuition can fail, consider a family of 
densities pg and the hypothesis H: = 0. The Neyman-Pearson lemma tells us that 
when testing po against a specific pe, we should reject y in preference to x when 

Pejx) ^ Pejy) ,^ 
Po{x) po{y) 

the best test therefore rejects for large values of pe{x)/po{x), i.e. is the maximum 
likelihood ratio test. 

However, when more than one value of 9 is possible, consideration of only large 
values of pg{x)/pQ{x) (as is done by the maximum Hkelihood ratio test) may no 
longer be the right strategy. Values of x for which the ratio po(x)/pq{x) is small 
now also become important; they may have to be included in the rejection region 
because pg'{x)/po{x) is large for some other value 6'. 

This is clearly seen in the situation of Theorem 2 with / increasing and g de- 
creasing, as illustrated in Fig. 1. 

For the values of x for which / is large, g is small, and vice versa. The behavior 
of the test therefore depends crucially on values of x for which f{x) or g{x) is small, 
a fact that is completely ignored by the maximum likelihood ratio test. 

Note however that this same phenomenon does not arise when all the alternative 
densities /, . . . are increasing. When n — I, there then exists a uniformly most 
powerful test and it is the maximum likelihood ratio test. This is no longer true 
when n > 1, but even then all reasonable tests, including the maximum likelihood 
ratio test, will reject the hypothesis in a region where all the observations are large. 
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Fig 1. 



6. Conclusions 

For the reasons indicated in Section 1 , maximum likelihood ratio tests are so widely 
accepted that they almost automatically are taken as solutions to new testing prob- 
lems. In many situations they turn out to be very satisfactory, but gradually a col- 
lection of examples has been building up and is augmented by those of the present 
paper, in which this is not the case. 

In particular when the problem remains invariant under a transitive group of 
transformations, a different principle (likelihood averaged or integrated with respect 
to an invariant measure) provides a test which is uniformly at least as good as the 
maximum likelihood ratio test and is better unless the two coincide. From the 
argument in Section 2 it is seen that this superiority is not restricted to invariant 
situations but persists in many other cases. A similar conclusion was reached from 
another point of view by Berger, Liseo and Wolpert [1]. 

The integrated likelihood approach without invariance has the disadvantage of 
not being uniquely defined; it requires the choice of a measure with respect to 
which to integrate. Typically it will also lead to more complicated test statistics. 
Nevertheless: In view of the superiority of integrated over maximum likelihood for 
large classes of problems, and the considerable unreliability of maximum likelihood 
ratio tests, further comparative studies of the two approaches would seem highly 
desirable. 
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